Equivalence, non-inferiority and superiority testing

an interactive visualization

Created by Kristoffer Magnusson

It is not uncommon to see researchers conclude that two treatments are equally effective, based on an insignificant test of the null hypothesis. Or that reducing the length of a treatment yields treatment effects that are no worse than the standard (longer) treatment, based on p > 0.05. Clearly, both conclusions are wrong. Much has been written about this, and in medicine the appropriate types of tests for these kinds of hypotheses are equivalence and non-inferiority tests. When testing for equivalence, we test whether a treatment effect is inside a prespecified equivalence margin [-Δ, Δ]. Similarly, when testing if a treatment is at least not worse than another treatment, we test if the effect is above a prespecified non-inferiority margin -Δ. My aim with this visualization is to show the decision rules associated with these different types of hypotheses. This visualization also shows how power relates to the different tests and different values of Δ, d and n.

Below I use a 95 % confidence interval to demonstrate the different hypotheses. You can move the CI around using the sliders or by clicking and dragging. Results of the test of treatment differences will automatically be highlighted.

Settings

Observed effect (d = 1)
-1.5
-1
0.5
0
0.5
1
1.5
Sample size (n = 10)
1
10
20
30
40
50
75
100
1k
Margin (Δ = 0.3)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

Effect of new treatment is inconclusive

Δ-1.5-1.0-0.50.51.01.5← Favors old treatmentFavors new treatment →-0.381.380.5Superior →EquivalenceNon-inferior →← InferiorEffect (d)
95 % CI

Power

Superiority

Non-inferiority

Equivalence

02004006008001,000020406080100n per group
02004006008001,000020406080100n per group
02004006008001,000020406080100n per group
H0: d = 0
Ha: d > 0
H0: d ≤ -Δ
Ha: d > -Δ
H0: d ≤ -Δ or d ≥ Δ
Ha: -Δ < d < Δ

Technical notes

Power is calculated using the following power functions. Note that α is 0.025 for all tests, since a 95 % CI is used. Also note that normal approximations are used. So power will be slightly off for really small sample sizes.

Power of equivalence test

1β=Φ(|dΔ|2/nZ1α)+Φ(|d+Δ|2/nZ1α)1

Power of non-inferiority test

1β=Φ(d+Δ2/nZ1α)

Power of superiority test

1β=Φ(d2/nZ1α)

where Φ is the cumulative distribution function of the standard normal distribution. d is Cohen's d, Δ is non-inferiority or equivalence margin, n is the sample size per group, and is the Z1α is the 100(1α)th percentile of a standard normal distribution.

Formulas are adapted from Julious, Steven A. "Sample sizes for clinical trials with normal data." Statistics in medicine 23.12 (2004): 1921-1986.

Type I error

Non-inferiority is shown if the lower side of a two-sided (1–2α)×100% CI is above -Δ. In this case that means a 95 % CI, so the significance level is 0.025. Using the two one-sided test (TOST) procedure, equivalence is tested using a (1–2α)×100% CI. In this case this significance level is also 0.025. In the visualization superiority testing is also performed as a one tailed test, also with a significance level of 0.025. So if we wanted to use a 0.05 significance level we would use 90 % CIs.

Support my work

The content on this blog is shared for free under a CC-BY license. If you like my work and want to support it you can:

Buy me a coffeeBuy me a coffee (or use PayPal)

You can also sponsor my open source work using GitHub Sponsors

Suggestions, errors, and bugs

Have any suggestion? Or found any bugs? Send them to me, my contact info can be found here.